{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<br/>\n",
    "\n",
    "# Part 1: A Good Teacher Is Helpful For All Students -- Custom Loss\n",
    "\n",
    "<br/>\n",
    "\n",
    "Basically, the finally score is an average of 4 AUC. 3 of them only take into account parts of the dataset that depending on whether the comment mentions word like 'gay' and whether it's toxic.\n",
    "\n",
    "![metric.png](Pictures/metric.png)",
    "![metric2.png](Pictures/metric2.png)",
    "\n",
    "<br/>\n",
    "Because of the 4 AUC average evaluation metric, we try to make a custom loss fuction instead of just using the binary cross entropy. \n",
    "\n",
    "## There are 2 main change of the loss fuction:\n",
    "\n",
    "### 1. weight each sample\n",
    "\n",
    "The main idea is:<br/>\n",
    "Each sample participates in some of these AUC. **A sample that participates in 3 terms is more important than a sample that participates in 2 terms** since giving a bad score to that sample affects the overall score more.\n",
    "\n",
    "What we do is as following:\n",
    "We calculate the weight of each sample base on how many AUC they belong to.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Overall\n",
    "weights = np.ones((len(train_df),)) / 4\n",
    "\n",
    "# Subgroup\n",
    "weights += (train_df[identity_columns].fillna(0).values>=0.5).sum(axis=1).astype(bool).astype(np.int) / 4\n",
    "\n",
    "# Background Positive, Subgroup Negative\n",
    "weights += (( (train_df['target'].values>=0.5).astype(bool).astype(np.int) +\n",
    "   (train_df[identity_columns].fillna(0).values<0.5).sum(axis=1).astype(bool).astype(np.int) ) > 1 ).astype(bool).astype(np.int) / 4\n",
    "\n",
    "# Background Negative, Subgroup Positive\n",
    "weights += (( (train_df['target'].values<0.5).astype(bool).astype(np.int) +\n",
    "   (train_df[identity_columns].fillna(0).values>=0.5).sum(axis=1).astype(bool).astype(np.int) ) > 1 ).astype(bool).astype(np.int) / 4\n",
    "\n",
    "# for later normalization the loss\n",
    "loss_weight = 1.0 / weights.mean()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2. Auxiliary Target \n",
    "<br/>\n",
    "Actually, the competition dataset not only have the toxicity column that can be treated as target, there are other columns that is highly correlated to target too.\n",
    "There are 6 more columns that is highly correlated with target column:\n",
    "\n",
    "**['severe_toxicity', 'obscene', 'identity_attack', 'insult', 'threat','sexual_explicit']**\n",
    "\n",
    "For more explaination of the dataset, please check [here](https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification/data).\n",
    "\n",
    "What's more the official baseline (score about 0.87) just convert toxicity >= 0.5 to 1 and others 0, which may lose some useful information for the model. So we also use the toxicity probability as auxiliary target."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "y_columns = ['target'] #0/1\n",
    "y_aux_columns = \\\n",
    "['target_prob','target_prob','severe_toxicity', 'obscene', 'identity_attack', 'insult', 'threat','sexual_explicit']\n",
    "# two target_prob is for adjusting the weight of aux_columns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def custom_loss(preds,targets,weights):\n",
    "    ''' Define custom loss function for weighted BCE on 'target' column '''\n",
    "    bce_loss_1 = nn.BCEWithLogitsLoss(weight=weights)(preds[:,0],targets[:,0]) #weighted y_columns\n",
    "    bce_loss_2 = nn.BCEWithLogitsLoss()(preds[:,1:],targets[:,1:]) # y_aux_columns\n",
    "    return ((bce_loss_1 * loss_weight)*0.60 + bce_loss_2*0.40)*2 "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " It turn out that not only the custom loss fuction works well (boost LSTM model AUC:0.930 -> 0.934 when doing experiment), when we use the custom loss for fine-tuning Bert & GPT2, it also gives the model great boost. "
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.8"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
